
/*==============================================================================
PART 2A: Deposit Beta Estimation
==============================================================================
Purpose:
This script estimates bank-specific deposit betas for several historical periods
marked by distinct monetary policy cycles.

Input:
- $path_clean/call_reports_prebeta.dta (Prepared bank-quarter panel before sample restrictions)
- $path_clean/sample_2022q4.dta (List of banks from final sample after sample restrictions)
- $path_clean/call_reports_prepared.dta (Prepared bank-quarter panel after sample restrictions)
- FRED data on interest rates and CPI ($path_raw/)

Output:
- A Stata dataset in long panel form containing bank-period observations.
  Each observation identifies `rssdid`
  and `period` (the name of the estimation period). It includes the estimated
  deposit betas (`beta_depexp`, `beta_ins`, `beta_unins`)
  for that specific period.

 **PERIOD NAMING CONVENTION:**
  NOTE: we use the period name to indicate to which analysis period the data will be merge.
  For example, even though betas are computed in 2019q2 period they are associated to 2021q4 because
  those betas are merged to asset, deposit values, etc. from 2021q4.
  Note jun2019 is not an anlysis period but we keep it in that do file for clarity.

  - "jun2019": Uses 2019q2 as end quarter, represents the sample used for dec2021 beta estimation
  - "dec2021": Uses 2021q4 as end quarter, betas inherited from jun2019 period.
  - "feb2023": Uses 2022q4 as end quarter
  - "feb2024": Uses 2023q4 as end quarter

  Example structure (simplified):
     +-----------------------------------------------------------------+
     | rssdid       yq      period   beta_depexp   beta_ins   bet~unins |
     |-----------------------------------------------------------------|
  1. |   6329   2019q2   jun2019      .2405995   .1500000    .3000000 |
  2. |   6329   2021q4   dec2021      .2405995   .1500000    .3000000 |
  3. |   6329   2022q4   feb2023      .0626449   .0400000    .0800000 |
  4. |   6329   2023q4   feb2024      .3556415   .2500000    .4000000 |
     +-----------------------------------------------------------------+

Methodology:
Deposit betas are calculated as the ratio of the change in a bank's deposit
rate to the change in the Effective Federal Funds Rate (FFR) over a specific
historical period.
The formula is:
  beta = (deposit_rate_t2 - deposit_rate_t1) / (ffr_t2 - ffr_t1)

Where t1 and t2 are the start and end quarters of the estimation period.
The calculated beta for a given period is a single value per bank, which is
then applied to all quarters for that bank.

Last updated: Aug 9, 2025
==============================================================================*/
display "--- Starting Part 2A: Deposit Beta Estimation ---" // Indicate the start of the script

/*===============================================================================
 Setup
===============================================================================*/
// Load the prepared Call Report data.
use "$path_clean/call_reports_prebeta.dta", clear

// Declare data as panel
sort rssdid yq
tsset rssdid yq

/*===============================================================================
 Step 1: Merge Macroeconomic Data
===============================================================================*/
// Purpose: Augment the bank panel with quarterly macroeconomic data from FRED,
// which is required for the beta calculation (FFR) and other analyses.

// Merge in quarterly Effective Federal Funds Rate (FFR)
merge m:1 yq using "$path_raw/fedfunds_avg.dta", keepusing(ffr) keep(master match) nogen

// Merge in 10-Year Treasury Rate
// If no suffix or default suffix, use raw data
if "${ir_suffix}" == "" | "${ir_suffix}" == "." {
	merge m:1 yq using "$path_raw/tenyearrate${ir_suffix}.dta", keepusing(dgs10 f1*) keep(master match) nogen
}
else { // Otherwise, use data from the temporary directory (for extensions)
	merge m:1 yq using "$path_temp/tenyearrate${ir_suffix}.dta", keepusing(dgs10 f1*) keep(master match) nogen
}

sort rssdid yq

/*===============================================================================
 Step 2: Estimate Deposit Betas for Historical Periods
===============================================================================*/
// Purpose: Calculate deposit betas for the specified deposit rate variable over
// four distinct historical periods. Each beta is a single value per bank/period.

// Define local macros for the rate variable and periods of interest
// Period format: "name start_qtr end_qtr [multiplier_for_depexp]"
local ratevars depexp // The deposit rate variable to use (e.g., depexp, intexp, etc.)
local periods `" "dec2021 2015q4 2019q2 1" "feb2023 2021q4 2022q4 0.3089/0.2291" "feb2024 2021q4 2023q4 0.41/0.35" "'


// Loop through each defined period
foreach p of local periods {
    gsort rssdid yq

    // Parse the period definition into more readable local macros
    local name : word 1 of `p' // Period name (e.g., dec2021)
    local start_qtr : word 2 of `p' // Start quarter (e.g., 2015q4)
    local end_qtr : word 3 of `p' // End quarter (e.g., 2019q2)
    local multiplier_val : word 4 of `p' // Get the potential multiplier value

    display "Calculating beta for depexp in period `name' (`start_qtr' - `end_qtr')..."

    // Convert the quarter strings to Stata quarterly dates
    local dt1 = tq(`start_qtr')
    local dt2 = tq(`end_qtr')

    // Use a quiet block to suppress output from the bysort commands
    quietly {

        // For each bank, get the deposit rate and FFR at the start and end of the period
        by rssdid (yq): gen rate_start = depexprate if yq == `dt1'
        by rssdid (yq): gen rate_end   = depexprate if yq == `dt2'
        by rssdid (yq): gen ffr_start  = ffr if yq == `dt1'
        by rssdid (yq): gen ffr_end    = ffr if yq == `dt2'

        // Propagate these start/end values to all observations for each bank
        by rssdid (yq): egen rate_s = max(rate_start)
        by rssdid (yq): egen rate_e = max(rate_end)
        by rssdid (yq): egen ffr_s  = max(ffr_start)
        by rssdid (yq): egen ffr_e  = max(ffr_end)

        // Calculate the deposit beta using the arc elasticity formula: (Change in Rate) / (Change in FFR)
        gen beta_depexp_`name' = (rate_e - rate_s) / (ffr_e - ffr_s)

        // Censor beta at zero
        replace beta_depexp_`name' = 0 if beta_depexp_`name' < 0 & !missing(beta_depexp_`name')

        // Apply externally calibrated multiplier if defined for this period and rate
        display "Applying multiplier `multiplier_val' to beta_depexp_`name'"
        replace beta_depexp_`name' = (`multiplier_val') * beta_depexp_`name'

        // Drop temporary variables created in this loop
        drop rate_start rate_end ffr_start ffr_end rate_s rate_e ffr_s ffr_e
    }
}

sort rssdid yq

// Calculate period-average uninsured shares (needed for decomposition)
// Note: the mean is computed among any existing yq within the sample period (even if some missing)
foreach p of local periods {
    local name : word 1 of `p'
    local start_qtr : word 2 of `p'
    local end_qtr : word 3 of `p'

    display "Calculating average uninsured share for period `name' (`start_qtr' - `end_qtr')..."
    local dt1 = tq(`start_qtr')
    local dt2 = tq(`end_qtr')

    gen temp_period_`name' = 1 if inrange(yq, `dt1', `dt2')
    gen uninsuredsh_domdep_`name' = uninsuredsh_domdep * temp_period_`name'
    by rssdid: egen uninsuredsh_domdep_avg`name' = mean(uninsuredsh_domdep_`name')
    drop temp_period_`name' uninsuredsh_domdep_`name'
}

//===============================================================================
// Step 3: Create Long Format Panel
//===============================================================================
// Purpose: Reshape the dataset to a long format, keeping only the end quarters
// of the defined periods as observations, and assign period identifiers.

// Convert quarter strings to dates for filtering
local dt_2019q2 = tq(2019q2)
local dt_2021q4 = tq(2021q4)
local dt_2022q4 = tq(2022q4)
local dt_2023q4 = tq(2023q4)

// Keep only observations corresponding to the end quarters of the defined periods
keep if yq == `dt_2019q2' | yq == `dt_2021q4' | yq == `dt_2022q4' | yq == `dt_2023q4'

// Create a string variable to identify the period for each observation
gen period = ""

// Assign period names based on the end quarter
replace period = "jun2019" if yq == `dt_2019q2' // Beta estimation sample for dec2021
replace period = "dec2021" if yq == `dt_2021q4' // Betas for dec2021 analysis period - inherited from jun2019 + filled if missing
replace period = "feb2023" if yq == `dt_2022q4'
replace period = "feb2024" if yq == `dt_2023q4'

// Keep relevant variables for the long format dataset
keep rssdid yq period beta_* uninsuredsh_domdep_avg* uninsuredsh_domdep

// Reshape beta variables to a single variable 'beta_depexp' based on the period
foreach v of local ratevars {
    gen beta_`v' = .
    gen uninsuredsh_domdep_avg = . // Create a single average uninsured share variable

    // Assign the correct period-specific beta to the generic beta variable
    // Both jun2019 and dec2021 periods use the beta estimated on the jun2019 sample (beta_depexp_dec2021)
    replace beta_`v' = beta_`v'_dec2021 if period == "jun2019"
    replace beta_`v' = beta_`v'_dec2021 if period == "dec2021"
    replace beta_`v' = beta_`v'_feb2023 if period == "feb2023"
    replace beta_`v' = beta_`v'_feb2024 if period == "feb2024"
}

// Assign the correct period-specific average uninsured share to the generic variable
replace uninsuredsh_domdep_avg = uninsuredsh_domdep_avgdec2021 if period == "jun2019"
replace uninsuredsh_domdep_avg = uninsuredsh_domdep_avgdec2021 if period == "dec2021"
replace uninsuredsh_domdep_avg = uninsuredsh_domdep_avgfeb2023 if period == "feb2023"
replace uninsuredsh_domdep_avg = uninsuredsh_domdep_avgfeb2024 if period == "feb2024"

// Drop the original period-specific beta and average uninsured share variables
drop beta_*_dec2021 beta_*_feb2023 beta_*_feb2024 ///
     uninsuredsh_domdep_avgdec2021 uninsuredsh_domdep_avgfeb2023 uninsuredsh_domdep_avgfeb2024

//===============================================================================
// Step 4: Decomposition Regression and Calculation (Long Format)
//===============================================================================
// Purpose: Run decomposition regressions for each independent period (jun2019,
// feb2023, feb2024) and calculate insured/uninsured betas in the long format dataset.

// Apply sample restrictions
merge m:1 rssdid using "$path_clean/sample_2022q4.dta", keep(3) nogen
drop if (rssdid==2697963 | rssdid==3664588) & (yq==tq(2023,4)) // Drop two banks in 2022q4 which are excluded from final sample due to low deposits-to-assets ratio to align with final sample

// Prepare variables
eststo clear // Clear any stored estimation results

gen beta_ins = . // Initialize variable for insured deposit beta
gen beta_unins = . // Initialize variable for uninsured deposit beta

// Run decomposition regressions independently for each specified period
foreach p_name in jun2019 feb2023 feb2024 {
    display "Running decomposition regression for period `p_name'..."

    // Regression: Total deposit beta = alpha + beta_unins * Uninsured Share + epsilon
    // The coefficient on uninsuredsh_domdep_avg is the difference between uninsured and insured betas.
    reg beta_depexp uninsuredsh_domdep_avg if period == "`p_name'"
    estimates store reg_`p_name' // Store regression results
    estimates save "$path_temp/beta_reg_`p_name'${ext_suffix}.ster", replace // Save regression results to file

    // Calculate temporary insured and uninsured betas based on regression coefficients
    gen b_ins_temp = beta_depexp - (_b[uninsuredsh_domdep_avg] * uninsuredsh_domdep_avg) if period == "`p_name'"
    gen b_unins_temp = b_ins_temp + _b[uninsuredsh_domdep_avg] if period == "`p_name'"
  
    // Winsorize the calculated insured and uninsured betas within the current period at 5% and 95%
    quietly winsor2 b_ins_temp if period == "`p_name'", cut(5 95) replace
    quietly winsor2 b_unins_temp if period == "`p_name'", cut(5 95) replace

    // Update the main beta_ins and beta_unins variables with the winsorized values for this period
    replace beta_ins = b_ins_temp if period == "`p_name'" & !missing(b_ins_temp)
    replace beta_unins = b_unins_temp if period == "`p_name'" & !missing(b_unins_temp)

    drop b_ins_temp b_unins_temp // Drop temporary variables
    estimates drop reg_`p_name' // Drop stored regression results from memory
}

/*==============================================================================
BETA INHERITANCE AND MEAN-FILLING BLOCK
==============================================================================
In this section we:
- Assign the 2019q2 betas to 2021q4, as those are those to be used in dec 2021 analysis periods
- Whenever we cannot assign (2021q4 bank is not present in 2019q2), we take the sample mean for beta, ins, unins.
 (applies for only one bank in dec 2021)
==============================================================================*/

// Step 1: Inherit betas from jun2019 observations to dec2021 observations for the same bank (if present in both)
bysort rssdid (yq): replace beta_ins = beta_ins[_n-1] if period == "dec2021" & period[_n-1] == "jun2019"
bysort rssdid (yq): replace beta_unins = beta_unins[_n-1] if period == "dec2021" & period[_n-1] == "jun2019"

// Apply sample restrictions
merge 1:1 rssdid yq using "$path_clean/call_reports_prepared.dta", keep(3) nogen

// Step 2: Fill any remaining missing beta values in jun2019 and dec2021 periods with the mean beta from the jun2019 sample
foreach v in beta_depexp beta_ins beta_unins {
    // Calculate the mean beta from the jun2019 sample
    quietly sum `v' if period == "jun2019"
    local mean_jun2019 = r(mean)

    // Fill missing values in jun2019 with the calculated mean
    quietly count if missing(`v') & period == "jun2019"
    display "Filling `v' missing values in jun2019 for " r(N) " banks with mean (" `mean_jun2019' ")"
    replace `v' = `mean_jun2019' if missing(`v') & period == "jun2019"

    // Fill missing values in dec2021 with the mean from the jun2019 sample (as per original logic)
    quietly count if missing(`v') & period == "dec2021"
    display "Filling `v' missing values in dec2021 for " r(N) " banks with mean (" `mean_jun2019' ")"
    replace `v' = `mean_jun2019' if missing(`v') & period == "dec2021"
}


//===============================================================================
// Step 5: Save Final Dataset
//===============================================================================
// Purpose: Save the resulting dataset containing bank-period observations with
// estimated deposit betas and average uninsured shares.

// Drop the 2021q2 period as it will not be used in analysis.
drop if period=="jun2019"

// Keep only the essential variables for the output dataset
keep rssdid period beta* uninsuredsh_domdep_avg

// Sort the final dataset
sort rssdid period

// Save the dataset to the temporary directory
save "$path_temp/deposit_betas${ext_suffix}.dta", replace


display "--- Deposit beta estimation completed ---" // Indicate the completion of the script


